Integer programming decoder for machine translation

ABSTRACT

A machine translation (MT) decoder may transform a translation problem into an integer programming problem, such as a Traveling Salesman Problem (TSP). The decoder may invoke an integer program (IP) solver to solve the integer programming problem and output a likely decoding based on the solution.

CROSS-REFERENCE TO RELATED APPLICATIONS

[0001] This application claims priority to U.S. Provisional PatentApplication Serial No. 60/295,182, filed on May 31, 2001.

ORIGIN OF INVENTION

[0002] The research and development described in this application weresupported by DARPA-ITO under grant number N66001-00-1-9814. The U.S.Government may have certain rights in the claimed inventions.

BACKGROUND

[0003] Machine translation (MT) concerns the automatic translation ofnatural language sentences from a first language (e.g., French) intoanother language (e.g., English). Systems that perform MT techniques aresaid to “decode” the source language into the target language.

[0004] One type of MT decoder is the statistical MT decoder. Astatistical MT decoder that translates French sentences into English mayinclude a language model (LM) that assigns a probability P(e) to anyEnglish string, a translation model (TM) that assigns a probabilityP(f|e) to any pair of English and French strings, and a decoder. Thedecoder may take a previously unseen sentence f and try to find the ethat maximizes P(e|f), or equivalently maximizes P(e)·P(f|e).

[0005] A “stack decoder” is a type of statistical MT decoder. In a stackdecoder, possible translations are organized into a graph structure andthen searched until an optimal solution (translation) is found. Althoughstack decoders tend to produce good results, they do so at a significantcost. Maintaining and searching a large potential solution space isexpensive, both computationally and in terms of computer memory.

SUMMARY

[0006] A machine translation (MT) decoder may transform a translationproblem into an integer programming problem, such as a TravelingSalesman Problem (TSP). The decoder may include a graph generator whichgenerates a graph including a number of regions, or cities,corresponding to words in an input source language sentence or phrase.Each region may include a number of nodes, or hotels, corresponding topossible translations of the source language word corresponding to thatregion. The graph generator may use linguistic constraint information ina translation database to assign distances between hotels.

[0007] The decoder may invoke an integer program (IP) solver to find ashortest tour on the graph, i.e., solve the TSP. The decoder may outputa likely decoding which includes the words corresponding to the hotelsvisited aligned in the order the hotels were visited in the tour.

BRIEF DESCRIPTION OF THE DRAWINGS

[0008]FIG. 1 is a block diagram of a machine translation decoder.

[0009]FIG. 2 is a flowchart describing an integer programming decodeoperation 200.

[0010]FIG. 3 is a salesman graph for a translation problem.

[0011]FIG. 4 illustrates the results of a word alignment operation.

DETAILED DESCRIPTION

[0012]FIG. 1 illustrates a machine translation (MT) decoder 100 whichutilizes a linear integer programming model to translate sentences in asource language (e.g., French) into a target language (e.g., English).The decoder 100 may transform a decoding problem into a linear integerprogramming problem.

[0013] A solution to an integer programming problem involves anassignment of variables. Solutions are constrained by inequalitiesinvolving linear combinations of variables. An optimal solution is onethat respects the constraints and minimizes the value of the objectivefunction, which is also a linear combination of variables.

[0014] One type of linear integer programming problem is the TravelingSalesman Problem (TSP). The hypothetical situation posed by the TSPconcerns a salesman who spends his time visiting a number of cities (ornodes) cyclically. In one tour the salesman visits each city just once,and finishes up where he started. The Traveling Salesman Problem isthis: given a finite number of “cities” along with the cost of travelbetween each pair of them (e.g., distance), find the cheapest (e.g.,shortest) way of visiting all the cities and returning to the startingpoint.

[0015] Much effort has been dedicated to the TSP, and powerful toolshave been developed to solve TSP and other integer programming problems.For example, the Center for Research on Parallel Computation (CRPC), aNational Science Foundation Science and Technology Center established in1989, has solved the TSP for 13,509 U.S. cities with populations of morethan 500 people. Such an integer program (IP) solver may be utilized bythe decoder 100 to solve a translation problem posed as a TSP.

[0016] The decoder may transform a decoding problem into a TSP formatand then use an IP solver 105 to generate a translation. FIG. 2 is aflowchart describing a decoding operation 200 according to anembodiment.

[0017] A graph generator 110 may express an MT decoding problem in a TCPformat by constructing a salesman graph. FIG. 3 is a salesman graph 300for the input sentence f=“CE NE EST PAS CLAIR .” (block 205). Each wordin the observed sentence f may be represented as a city 305. Cityboundaries 310 are shown with bold lines. Each city may be populatedwith a number of hotels 315 corresponding to likely English wordtranslations (block 210). The owner of a hotel is the English wordinside the rectangle. If two cities have hotels with the same owner x,then a third hotel 320 owned by x (e.g., the English word “is”) may bebuilt on the border of the two cities (in this case, the cities “CE” and“EST”). More generally, if n cities all have hotels owned by x,2^(n)−n−1 new hotels may be build, one for each non-empty, non-singletonsubset of the cities, on various city borders and intersections. Anextra city representing the sentence boundary may also be added to thesalesman graph and serve as the starting point for the tour.

[0018] A tour of cities may be defined as a sequence of hotels (startingat the sentence boundary hotel) that visits each city exactly oncebefore returning to the starting point. If a hotel sits on the borderbetween two cities, then staying at that hotel counts as visiting bothcities. Each tour of cities corresponds to a potential decoding <e,a>,where “e” represents the words in the English string and “a” representsthe alignment of the words. The owners of the hotels on the tour yielde, while the hotel locations yield a.

[0019] The distances between hotels may represent various constraints ofan integer program. Real-valued (asymmetric) distances may beestablished between pairs of hotels such that the length of any tour isexactly the negative of log(P(e)·P(a,f|e)) (block 215). For example,because the logarithm function is monotonic, the shortest tour maycorrespond to the likeliest decoding.

[0020] The distance assigned to each pair of hotels may be based on atranslation model 115, e.g., the IBM Model 4 formula, described in U.S.Pat. No. 5,477,451. The IBM Model 4 revolves around the notion of a wordalignment over a pair of sentences, such as that shown in FIG. 4. A wordalignment assigns a single home (English string position) to each Frenchword. If two French words align to the same English word, then thatEnglish word is said to have a fertility of two. Likewise, if an Englishword remains unaligned-to, then it has fertility zero. Because thedestination hotel “not” sits on the border between cities NE and PAS, itcorresponds to a partial alignment in which the word “not” has fertilitytwo:

[0021] Assuming that the price has already been paid for visiting the“what” hotel, then the inter-hotel distance need only account for thepartial alignment concerning “not”:

[0022] distance =

[0023] −log(bigram(not|what))//chance of “not” given the previous wordwas “what”

[0024] −log(n(2|not))//chance that word “not” in English generates twoFrench words

[0025] −log(t(NE|not))−log(t(PAS|not))//chance that “not” wouldtranslate to “ne” and “pas”.

[0026] −log(d₁(+1|class(what),class(NE)))

[0027] −log(d_(>1)(+2|class(PAS)))//given that “what” translates to thefirst French word, what is the chance that “not” will translate to thesecond French word.

[0028] These and constraints for different words may be stored in atranslation database 120.

[0029] An infinite distance may be assigned in both directions betweenhotels that are located (even partially) in the same city because travelfrom one to the other can never be part of a tour.

[0030] NULL-owned hotels may be treated specially. All non-NULL hotelsmust be visited before any NULL hotels and at most one NULL hotel may bevisited on a tour. Since only one NULL hotel is allowed to be visited,the fertility of the NULL word is simply the number of cities that hotelstraddles, and the length of f is the number of cities minus one.

[0031] The tour selection may be cast as an integer programming problem(block 225). A binary (0/1) integer variable x_(ij) may be created foreach pair of hotels i and j. The value x_(ij) equals 1 if and only iftravel from hotel i to hotel j is on the itinerary. The objectivefunction is then:${minimize}:{\sum\limits_{({i,j})}{\chi_{ij} \cdot {{distance}\left( {i,j} \right)}}}$

[0032] This minimization may be subject to three classes of constraints.First, every city must be visited exactly once. That means exactly onetour segment must exit each city:${\forall_{c \in {cities}}{:{\sum\limits_{\substack{i\quad {located}\quad {at}\quad {least} \\ {partially}\quad i\quad n\quad c}}{\sum\limits_{j}\chi_{ij}}}}} = 1$

[0033] Second, the segments must be linked to one another, i.e., everyhotel has either (a) one tour segment coming in and one going out, or(b) no segments in and none out. To put it another way, every hotel musthave an equal number of tour segments going in and out:${\forall_{i}{:{\sum\limits_{j}\chi_{ij}}}} = {\sum\limits_{j}\chi_{ji}}$

[0034] Third, to prevent multiple independent sub-tours, require thatevery proper subset of cities have at least one tour segment leaving it:$\forall_{s \Subset \quad {cities}}{:{{\sum\limits_{\substack{i\quad {located} \\ {entirely} \\ {within}\quad s}}{\sum\limits_{\substack{j\quad {located} \\ {at}\quad {least} \\ {partially} \\ {outside}\quad s}}\chi_{ij}}}>=1}}$

[0035] There may be an exponential number of constraints in this thirdclass.

[0036] Once cast as an integer program, the IP solver may be invoked.Exemplary IP solvers include lp_solve, available free of charge atftp://ftp.ics.ele.tue.nl/pub/lp_solve, and CPLEX, available from ILOG,Inc. of Mountain View, Calif.

[0037] Mnemonic names may be assigned to the variables, and <e,a>extracted from the list of variables and their binary values. Thesentence corresponding to the shortest tour may then be output (block235). For example, the shortest tour 350 for the graph 300 in FIG. 3corresponds to the optimal decoding: “it is not clear.” A second-bestdecoding can be obtained by adding a new constraint to the integerprogramming problem to stop the IP solver 105 from choosing the samesolution again.

[0038] A number of embodiments have been described. Nevertheless, itwill be understood that various modifications may be made withoutdeparting from the spirit and scope of the invention. For example,blocks in the flowchart may be skipped or performed out of order andstill produce desirable results. Accordingly, other embodiments arewithin the scope of the following claims.

1. A method comprising: transforming a translation problem into aninteger programming problem; and generating a translation in response tosolving the integer programming problem.
 2. The method of claim 1,wherein said generating a translation comprises generating a textsegment in a target language corresponding to a translation of a textsegment in a source language.
 3. The method of claim 1, wherein saidtransforming comprises generating a graph comprising a plurality ofregions, each region including one or more nodes.
 4. The method of claim3, further comprising: inputting a text segment including words in asource language, wherein each of said plurality of regions represents acorresponding one of the words in the source language, and wherein theone or more nodes in each of said plurality of regions representpossible translations of the word corresponding to said region.
 5. Themethod of claim 4, further comprising: assigning a distance between eachof said nodes, said distance comprising one or more linguisticconstraints.
 6. The method of claim 5, wherein said generating atranslation comprises finding a shortest tour including one node in eachof said plurality of regions.
 7. The method of claim 6, wherein saidgraph further comprises a region corresponding to a sentence boundary.8. The method of claim 7, wherein said tour begins and ends in theregion corresponding to the sentence boundary.
 9. The method of claim 1,wherein said transforming comprises transforming the translation probleminto a traveling salesman problem.
 10. Apparatus comprising: atransformation module operative to transform an input text segment in asource language into an integer programming problem ; and an integerprogram solver operative to solve said integer programming problem. 11.The apparatus of claim 10, further comprising: a database including aplurality of linguistic constraints for a target language and aplurality of words in the target language corresponding to possibletranslations for a plurality of words in a source language.
 12. Theapparatus of claim 11, wherein the transformation module comprises agraph generator operative to generate a graph including a plurality ofregions and a plurality of nodes, each region corresponding to a word inan input source language text segment and each node corresponding to apossible translation of a word in the input source language, and toassign a distance between nodes based on said plurality of linguisticconstraints.
 13. The apparatus of claim 12, wherein the integer programsolver is operative to find a shortest tour including one node in eachof said plurality of regions.
 14. The apparatus of claim 13, wherein theinteger problem solver is further operative to output a text segment inthe target language, said text segment including a plurality of wordscorresponding to the nodes in the shortest tour.
 15. The apparatus ofclaim 14, wherein the plurality of words are aligned in an ordercorresponding to an order of nodes visited in the tour.
 16. An articlecomprising a machine-readable medium including machine-executableinstructions, the instruction operative to cause the machine to:transform a translation problem into an integer programming problem; andgenerate a translation in response to solving the integer programmingproblem.
 17. The article of claim 16, wherein the instructions operativeto cause the machine to generate a translation include instructionsoperative to cause the machine to generate a text segment in a targetlanguage corresponding to a translation of a text segment in a sourcelanguage.
 18. The article of claim 16, wherein the instructionsoperative to cause the machine to transform include instructionsoperative to cause the machine to generate a graph comprising aplurality of regions, each region including one or more nodes.
 19. Thearticle of claim 18, further comprising instructions operative to causethe machine to: input a text segment including words in a sourcelanguage, wherein each of said plurality of regions represents acorresponding one of the words in the source language, and wherein theone or more nodes in each of said plurality of regions representpossible translations of the word corresponding to said region.
 20. Thearticle of claim 19, further comprising instructions operative to causethe machine to: assign a distance between each of said nodes, saiddistance comprising one or more linguistic constraints.
 21. The articleof claim 20, wherein the instructions operative to cause the machine togenerate a translation includes instructions operative to cause themachine to find a shortest tour including one node in each of saidplurality of regions.
 22. The article of claim 21, wherein said graphfurther comprises a region corresponding to a sentence boundary.
 23. Thearticle of claim 22, wherein said tour begins and ends in the regioncorresponding to the sentence boundary.
 24. The article of claim 16,wherein the instructions operative to cause the machine to transforminstructions operative to cause the machine to transform the translationproblem into a traveling salesman problem.